FFTs of Arbitrary Dimensions on GPUs

نویسندگان

  • Xiaobai Sun
  • Nikos Pitsianis
چکیده

We present the fast Fourier transform (FFT), of arbitrary dimensions, on the graphics processing unit (GPU). The FFT on GPUs exploits the architecture in its image processing capability, as well as its particular graphics/image rendering capacity. It also couples the processing and rendering furthermore. We view the GPU as a special architecture that supports fine-granularity, two-dimensional (2D) memory accesses at the level of application programming interface (API). The unique architectural features are utilized by mathematical and algorithmic means richly associated with the FFT, which has an important role in signal and image processing and in scientific computing in general. At the kernel of the FFT on GPUs, i.e., at the level innermost to the the native architecture, are the primitive array operations for the 2D FFT, instead of the 1D FFT. Basically, the 2D array operations have natural mappings to the architecture by their joint potential in performance. A lower or higher dimensional FFT is described in terms of the kernel operations, in order to exploit the architecture at the application programming level. This algorithmic abstraction of the operation primitives and their compositions enables, especially, the 2D twiddle scaling, which uses less memory space, and the 2D bit-reversal permutation, which manifests the unique GPU feature in memory access. The 2D FFT on GPUs is detailed in [3], where mixed-radix factorizations are also used to further utilize the memory resource. In this paper we turn the focus onto FFTs of other dimensions on GPUs. We describe the FFT reformulation and data mappings. We provide experimental results to demonstrate that the 2D FFT performance is conveyed to the other FFTs as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Algorithms for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs

We describe two algorithms for computing multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do not fit in the memory of the entire system. Instead, data reside on a parallel disk system and are brought into memory in sections. We use the Parallel Disk Model for implementation and analysis. The first me...

متن کامل

A geometric nonuniform fast Fourier transform

An efficient algorithm is presented for the computation of Fourier coefficients of piecewise-polynomial densities on flat geometric objects in arbitrary dimension and codimension. Applications range from standard nonuniform FFTs of scattered point data, through line and surface potentials in two and three dimensions, to volumetric transforms in three dimensions and higher. Input densities are s...

متن کامل

Multidimensional, Multiprocessor, Out-of-Core FFTs with Distributed Memory and Parallel Disks

We show how to compute multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do not fit in the memory of the entire system. Instead, data reside on a parallel disk system and are brought into memory in sections. We use the Parallel Disk Model for implementation and analysis. Our method is a straightforwar...

متن کامل

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

Nufft-based Imaging of Vegetation on Graphic Cards

We present an algorithm for the fast tomography of vegetation, based on a Radon mathematical setting and on the combined use of advanced processing algorithms (NonUniform FFTs) and hardware resources (Graphic Processing Units GPUs). The algorithm performance is firstly numerically estimated, showing the favorable trade off between faithfulness and speed, and highlighting the convenience of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007